PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER 



WORLD INTELLECTUAL PROPEF 
International Bin 




WO 



9605224A1 



(51) International Patent Classification 6 : 

C07K 14/59, 16/26, C12P 21/00, C12N 
5/10, 1/21, 15/16, 15/63 



Al 



(11) International Publication Number: WO 96/05224 

(43) International Publication Date: 22 February 1996 (22.02.96) 



(21) International Application Number: PCT/US95/09664 

(22) International Filing Date: 1 August 1995 (01 .08.95) 



(30) Priority Data: 
08/289,396 
08/310,590 
08/334,629 
08/351,591 



12 August 1994 (12.08.94) US 

22 September 1994 (22.09.94) US 

4 November 1994 (04.1 1.94) US 

7 December 1994 (07.12.94) US 



(71) Applicant: WASHINGTON UNIVERSITY [US/US]; One 
Brookings Drive. St. Louis, MO 63130 (US). 

(72) Inventor: BOIME, Irving; 27 Oak Park Drive, St. Louis, MO 
63141 (US). 

(74) Agents: MURASHIGE, Kate, H. et al.; Morrison & Foerster. 
2000 Pennsylvania Avenue, N.W., Washington, DC 20006 
(US). 



(81) Designated States: AU, BR t CA, CN, FI, HU, JP, ICR, MX, 
NO, RU. European patent (AT, BE CH. DE, DK, ES, FR, 
OB, GR, IE, IT, LU, MC, NL, PT, SE). 



Published 

With international search report. 



(54) title: SINGLE-CHAIN FORMS OF THE GLYCOPROTEIN HORMONE QUARTET 
(57) Abstract 

Single-chain forms of the glycoprotein hormone quartet, at least some members of which are found in most vertebrates are disclosed 
In one embodiment of these single-chain forms, the a and/3 subunits of the wild-type heterodimers or their variants are covalently linked 
optionally ^through a linker moiety. A drug may further be included within the linker moiety to be targeted to receptors for these hormones 
borne of the swgle<hain forms are agonists and others antagonists of the glycoprotein hormone activity. Another embodiment of the single- 
cham compounds of the invention comprises two 0 subunits of the glycoprotein hormones, which p subunits are the same or different 
These forms are antagonists of glycoprotein hormone activity. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



AT 


Austria 


GB 


United Kingdom 


MR 


Mauritania 


AU 


Australia 


GE 


Georgia 


MW 


Malawi 


BB 


Barbados 


GN 


Guinea 


NE 


Niger 


BE 


Betgium 


GR 


Greece 


NL 


Netherlands 


BF 


Burkina Fuo 


HU 


Hungary 


NO 


Norway 


BG 


Bulgaria 


IE 


Ireland 


NZ 


New Zealand 


BJ 


Benin 


IT 


Italy 


PL 


Poland 


BR 


Brazil 


JP 


Japan 


PT 


Portugal 


BY 


Belarus 


KE 


Kenya 


RO 


Romania 


CA 


Canada 


KG 


Kyigyiiin 


RV 


Rosalia Bedcmtoo 


cr 


Centnl African Republic 


KP 


Democratic foopsc'a Republic 


SD 


Sudan 


CG 


Congo 




ofKoiea 


SB 


Sweden 


CH 


Switzerland 


KR 


Republic of Korea 


SI 


Slovenia 


a 


Ccce<nvoire 


KZ 


Kazakhstan 


SK 


Slovakia 


CM 


ClSWIOOO 


U 


Lsecbtenctsn 


SN 


Senegal 


CN 


China 


LK 


Sri Lanka 


TD 


Chad 


cs 


Czechoslovakia 


tu 


Luxembourg 


TG 


Togo 


cz 


Ctecb Republic 


tv 


Latvia 


TJ 


TapUatan 


DE 


Germany 


MC 


Monaco 


TT 


Trinidad aod Tobago 


DX 


Denmark 


MD 


Republic of Moldova 


UA 


Ukraine 


BS 


Spain 


MG 


Madagascar 


US 


United States of America 


n 




ML 


Mali 


uz 


Uzbtkistaa 




f¥ancc 


MN 


Mongolia 


VN 


Viet Nan 


OA 


Gabon 











WO 96/05224 PCT/US95/09664 

- 1 - 

SINGLE- CHAIN FORMS OF THE GLYCOPROTEIN HORMONE QUARTET 



Acknowledgment of Government. Support 

This invention was made with government support under 
NIH Contract No. NOl-HD-9-2922, awarded by the National 
5 Institutes of Health. The government has certain rights in this 
invention . 

Technical Field 

The invention relates to the field of protein 
engineering and the glycoprotein hormones which occur normally 
10 as heterodimers. More specifically, the invention concerns 
single- chain forms of chorionic gonadotropin (CG) , thyroid 
stimulating hormone (TSH) , luteinizing hormone (LH) , and 
follicle stimulating hormone (FSH) . 

Background Art 

15 In humans, four important glycoprotein hormone 

heterodimers (LH, FSH, TSH and CG) have identical a subunits and 
differing $ subunits. Three of these hormones are present in 
virtually all other vertebrate species as well; CG has so far 
been found only in primates and in horse placenta and urine. 

20 PCT application WO90/09800, published 7 September 

1990, and incorporated herein by reference, describes a number 
of modified forms of these hormones. One important modification 
is C- terminal extension of the 0 subunit by the carboxy terminal 
peptide of human chorionic gonadotropin or a variant thereof. 

25 Other muteins of these hormones are also described. The 

relevant positions for the CTP are from any one of positions 
112-118 to position 145 of the 0 subunit of human chorionic 
gonadotropin. The PCT application describes variants of the CTP 
extension obtained by conservative amino acid substitutions such 

30 that the capacity of the CTP to alter the clearance 

characteristics is not destroyed. In addition, U.S. Serial No. 
08/049,869 filed 20 April 1993, incorporated herein by 
reference, describes modifying these hormones by extension or 
insertion of the CTP at locations other than the C- terminus and 

35 CTP fragments shorter than the sequence extending from positions 
112-118 to 145. 
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The CTP- extended ft subunit of FSH is also described in 
two papers by applicants herein: LaPolt, P.S. et al.; 
Endocrinology (1992) 131 :2514-2520 and Fares, F.A. et al.; Proc 
Natl Acad Sci USA (1992) £2:4304-4308 . Both of these papers are 
5 incorporated herein by reference. 

The crystal structure of the heterodimeric form of 
human chorionic gonadotropin has now been published in more or 
less contemporaneous articles; one by Lap thorn, A.J. et al. 
Nature (1994) 369 :455-461 and the other by Wu, H. et al. 

10 Structure (1994) 2:545-558. The results of these articles are 
summarized by Patel, D.J. Nature (1994) 2£2:438-439. 

At least one instance of preparing a successful 
single-chain form of a heterodimer is now known. The naturally 
occurring sweetener protein, monellin, is isolated from 

15 serendipity berries in a heterodimeric form. Studies on the 
crystal structure of the heterodimer were consistent with the 
proposition that the C- terminus of the B chain could be linked 
to the N- terminus of the A chain through a linker which 
preserved the spatial characteristics of the heterodimeric form. 

20 Such a linkage is advantageous because, for use as a sweetener 
protein, it would be advantageous to provide this molecule in a 
form stable at high temperatures. This was successfully 
achieved by preparing the single- chain form, thus impeding heat 
denaturation, as described in U.S. patent 5,264,558. 

25 PCT application W091/16922 published 14 November 1991 

describes a multiplicity of chimeric and otherwise modified 
forms of the heterodimeric glycoprotein hormones. In general, 
the disclosure 1b focused on chimeras of a subunits or 0 
subunits involving portions of various a or 0 chains 

30 respectively. One construct simply listed in this application, 
and not otherwise described, fuses substantially all of the 0 
chain of human chorionic gonadotropin to the a subunit 
preprotein, i.e., including the secretory signal sequence for 
this subunit. This construct falls outside the scope of the 

35 present invention since the presence of the signal sequence 
intervening between the fi and -a chains fails to serve as a 
linker moiety as defined and described herein. 

It has now been found that the normally heterodimeric 
glycoprotein hormones retain their properties when in single- 
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chain form, including single- chain forms that contain the 
various CTP extensions and insertions described above. 

Pig closure gf the indent App 

The invention provides single -chain forms of the 
5 glycoprotein hormones, at least some of which hormones are found 
in most vertebrate species. The single- chain forms of the 
invention may either be glycosylated, partially glycosylated, or 
nonglycosylated and the a and 0 chains (or a and a or 0 and fJ) 
that occur in the native glycoprotein hormones or variants of 
10 them may optionally be linked through a linker moiety. 

Particularly preferred linker moieties include the carboxy 
terminal peptide (CTP) unit either as a complete unit or only as 
a portion thereof. The resulting single- chain hormones either 
retain the activity of the unmodified heterodimeric form or are 
15 antagonists of this activity. 

Thus, in one aspect, the invention is directed to a 
glycosylated or nonglycosylated protein which comprises the 
amino acid sequence of the cr subunit common to the glycoprotein 
hormones linked covalently, optionally through a linker moiety, 
20 to the amino acid sequence of the 0 subunit of one of said 

hormones, or variants of said amino acid sequences wherein said 
variants are defined herein. 

In another aspect, the invention is directed to a 
glycosylated or nonglycosylated protein which comprises the 
25 amino acid sequence of the 0 subunit of a member of the 
glycoprotein hormone quartet linked covalently, optionally 
through a linker moiety, to the amino acid sequence of the 0 
subunit of one of said hormones, or variants of said amino acid 
sequences wherein said variants are defined herein. 
30 In another aspect, the invention is directed to a 

glycosylated or nonglycosylated protein which comprises the 
amino acid sequence of the a subunit of the glycoprotein hormone 
quartet linked covalently, optionally through a linker moiety, 
to the amino acid sequence of another a subunit, or variants of 
35 said amino acid sequences wherein said variants are defined 
herein. 

In still another aspect, the invention is directed to 
glycosylated or nonglycosylated single- chain forms of the 
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biologically important dimers whose efficacy is presaged by the 
single- chain forms of the hormone quartet. Thus, the invention 
is also directed to the single- chain forms of interleukins 3 and 
12 (IL-3 and IL-12), tumor necrosis factor (TNF) , transforming 
5 growth factor (TGF) as well as inhibin. Also included are 
hybrid interleukins such as single chain forms of one subunit 
from IL-3 and the other from IL-12. 

In other aspects, the invention is directed to 
recombinant materials and methods to produce the single- chain 
10 proteins of the invention, to pharmaceutical compositions 

containing them; to antibodies specific for them; and to methods 
for their use. 

Brief Description of the Drawings 

Figure 1 shows the construction of a Sail bounded DNA 
15 fragment fusing the third exon of CG0 with the second exon 
encoding the oc subunit. 

Figure 2 shows the amino acid sequence and numbering 
of positions 112-145 of human CG/J. 

Figure 3 shows the results of a competition binding 
20 assay for FSH receptor by various FSH analogs. 

Figure 4 shows the results of signal transduction 
assay with respect to FSH receptor for various FSH analogs. 

Modes of Carrying Put the Invent ion 

Four "glycoprotein" hormones in humans provide a 

25 family which includes human chorionic gonadotropin (hCG) , 

follicle stimulating hormone (FSH), luteinizing hormone (LH) r 
and thyroid stimulating hormone (TSH) . As used herein, 
"glycoprotein hormones" refers to the members of this family, 
whether found in humans or in other vertebrates. All of these 

30 hormones are heterodimers comprised of a subunit s which, for a 
given species, are identical in amino acid sequence among the 
group, and ft subunit s which differ according to the member of 
the family. Thus, normally these glycoprotein hormones occur as 
heterodimers composed of a and 0 subunits associated with each 

35 other but not covalently linked. Most vertebrates produce FSH, 
TSH and LH; chorionic gonadotropin has been found only in 
primates, including humans, and horses. A specific form of 06 
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from horses has been designated pregnant mare serum glycoprotein 
(PMSG) • 

Thus, this hormone "quartet" iB composed of 
heterodimers wherein the a and 0 subunits of each are encoded in 
5 different genes and axe separately synthesized by the host. The 
host then assembles the separately synthesized subunits into a 
non- covalently linked heterodimeric complex. In this manner, 
the heterodimers of this hormone quartet differ from 
heterodimers such as insulin which is synthesized from a single 

10 gene (in this case with an intervening "pro" sequence) and the 
subunits are covalently coupled using disulfide linkages. This 
hormone quartet is also distinct from the immunoglobulins which 
are assembled from different loci, but are covalently bound 
through disulfide linkages. On the other hand, monellin, which 

15 is, however, a plant protein, is held together through 

noncovalent interaction between its A and B chains. It is not 
known at present whether the two chains are encoded on separate 
genes . 

Thus, a variety of factors is influential in 
20 determining the behavior of biologically active compounds which 
are dimers formed from subunits that are identical or different. 
The subunits may be covalently or noncovalently linked; they may 
be synthesized by the same or different genes; and they may or 
may not contain, in their precursor forms, a "pro" sequence 
25 linking the two members of the dimer. Based on the results 

obtained with the single -chain forms of the glycoprotein hormone 
quartet herein, it is apparent that single- chain forms of the * 
biologically active dimers interleukin-12, interleukin-3 (IL-12 
and IL-3), inhibin, tumor necrosis factor (TNF) , and 
30 transforming growth factor (TGP) will also be biologically 
active. 

The single -chain forms of the heterodimers or 
hamodimers have a number of advantages over their dimeric forms. 
First, they are generally more stable. LH, in particular, is 
35 noted for its instability and short half-life. Second, problems 
of recombinant production are reduced since only a single gene 
need be transcribed, translated and processed. This is 
particularly important for expression in bacteria. Third, of 
course, they provide an alternate form thus permitting fine 
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tuning of activity levels and of in vivo half lives. Finally, 
single chain forms are unique starting materials for identifying 
truncated forms with the activity of the dimer. The linkage 
between the subunits permits the protein to be engineered 
5 without disturbing the overall folding of the protein. 

Features of the Members of the Quartet 

The 0 subunit of hCG is substantially larger than the 
other 0 subunits in that it contains approximately 34 additional 
amino acids at the C- terminus referred to herein as the carboxy 

10 terminal portion (CTP) which, when glycosylated at the 0-1 inked 
sites, is considered responsible for the comparatively longer 
serum half -life of hCG as compared to other gonadotropins 
{Matzuk, M. et al., Endocrinol (1989) 126 :3761 . In the native 
hormone, this CTP extension contains four mucin- like 0- linked 

15 oligosaccharides * 

In one embodiment of the present invention, the ot and 
0 chains of the glycoprotein hormones are coupled into a single - 
chain proteinaceous material where the a and 0 chain are 
covalently linked, optionally through a linker moiety. The 

20 linker moiety may include further amino acid sequence, and in 
particular the CTP units described herein can be advantageously 
included in the linker. In addition, the linker may include 
peptide or nonpeptide drugs which can be targeted to the 
receptors for the hormones. 

25 In addition to the head- to- tail configuration that is 

achievable by simply coupling the two peptide chains through a 
peptide bond, the a and 0 chains can be linked head- to-head or 
tail -to- tail. Head to head and tail to tail couplings involve 
synthetic chemistry using standard techniques to link two 

30 carboxyl or two amino groups through a linker moiety. For 

example, two amino groups may be linked through an anhydride or 
through any dicarboxylic acid derivative; two carboxyl groups 
can be linked through diamines or diols using standard 
activation techniques. However, the most preferred form is a 

35 head to tail configuration wherein standard peptide linkages 
suffice and the single- chain compound can be prepared as a 
fusion protein recombinant ly or using synthetic peptide 
techniques either in a single chain or, preferably, ligating 
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individual portions of the entire sequence. Of course, if 
desired, peptide or non-peptide linker moieties can be used in 
this case as well, but this is unnecessary and the convenience 
of recombinant production of the single- chain protein would 
5 suggest that embodiments that permit this method of production 
comprise by far the most preferred approach. 

When a head-to-tail configuration is employed, linkers 
may consist essentially of additional peptide sequence. As is 
the case with the heterodimers, the two 0 chains may be linked 
10 through a CTP unit as further described below. Thus, possible 
embodiments of the invention include, with the N- terminus at the 
left, a-FSH0, 0FSH-of, a-0LH, a-CTP-0LH, 0LH-CTP-a, CTP-0LH-CTP- 
a; and the like. 

The single chain forms of the heterodimeric 
15 gonadotropins or glycoprotein quartet also relate to additional 
important sets of embodiments wherein rather than coupling the a 
and 0 subunits, two 0 or two ot subunits may be coupled together 
to form a single-chain compound. As with the hetero-dimer , the 
coupling can be head to head, head to tail, or tail to tail. 
20 The "two-0" single -chain tandem peptides are 

especially useful as antagonists for the receptors normally 
activated by the heterodimeric glycoprotein hormones. Since the 
ot subunit is believed largely responsible for signal 
transduction, while 0 subunit confers receptor specificity, and 
25 since the of and 0 subunits have similar conformations, the 
single- chain compounds should be able specifically to bind a 
receptor for which at least one 0 chain is present without 
activating the receptor. 

The antagonist activity of the "2-0" single- chain 
30 tandem peptides is based in part on the crystal structure of the 
heterodimers. It is noted that the a and 0 chains have similar 
cystine -knot configurations and that some of the folding 
patterns of the two chains are analogous. 

The "two-/3" single- chain compounds of the invention 
35 may be designed to contain tandem copies of the same 0 

subunit i.e., FSH0-FSH0; HCG0-HCG0; TSH0-TSH0; or LH0-LH0; or 
chimeric single- chain compounds may be employed such as KCG0- 
FSH0; FSH0-LH/8; LH0-TSH0 and the like. There are a total of 12 
such possible combinations . In addition, the carboxyl terminal 
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peptide (CTP) of the HCG 0 subunit improves the conformation of 
the single- chain compound when present between the two 0 chains. 
This is automatic when HCG /S is the upstream portion; however in 
other instances, it is convenient to employ a CTP subunit as 
5 described herein at the carboxyl terminus of the upstream 
participant. Two such CTP units are also included within the 
invention scope . Thus, preferred embodiments include FSH0-CTP- 
FSH0; FSH0-CTP-CTP-FSH0; LH0-CTP-FSH0; LH0-CTP-CTP-FSH0 and the 
like. 

10 Similar descriptions apply to the "two or" single chain 

compounds, except, of course, that chimeric pairs are not 
included other than with respect to ot variants. Various 
linkers, preferably CTP-based, and CTP extensions are also 
included. 

15 The following definitions may be helpful in describing 

the single- chain forms of the molecules. 

As used herein, a subunit, and FSH, LH, TSH, and CG 0 

subunits as well as the heterodimeric forms have in general 

their conventional definitions and refer to the proteins having 
20 the amino acid sequences known in the art per se, or allelic 

variants thereof, regardless of the glycosylation pattern 

exhibited. 

"Native" forms of these peptides are those which have 
the amino acid sequences isolated from the relevant vertebrate 
25 tissue, and have these known sequences per se, or their allelic 
variants . 

"Variant" forms of these proteins are those which have 
deliberate alterations in amino acid sequence of the native 
protein produced by, for example, site-specific mutagenesis or 

30 by other recombinant manipulations, or which are prepared 
synthetically . 

These alterations consist of 1-10, preferably 1-8, and 
more preferably 1-5 amino acid changes, including deletions, 
insertions, and substitutions, most preferably conservative* 

35 amino acid substitutions as defined below. The resulting 

variants must retain activity which affects the corresponding 
activity of the native hormone i.e., either they must retain 
the biological activity of the native hormone directly, or they 
must behave as antagonists, generally by virtue of being able to 
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bind the receptors for the native hormones but lacking the 
ability to effect signal transduction. For example, it is known 
that if the glycosylation site at position 52 of the a subunit 
is removed by an amino acid substitution, therefore preventing 
5 all glycosylation at that site, the hormones which are 

heterodimers with this altered a subunit are generally agonists 
and are able to bind receptors preventing the native hormone 
from doing so in competition. (On the other hand, the 
glycosylation site of the of subunit at position 78 appears not 
10 greatly to affect the activity of the hormones.) Other 
alterations in the amino acid sequence may also result in 
antagonist rather than agonist activity for the variant . 

One set of preferred variants are those wherein the 
glycosylation sites of either the ct or fi subunits or both have 
15 been altered. The a subunit contains two glycosylation sites, 
one at position 52 and the other at position 78, and the effect 
of alterations of these sites on activity has just been 
described. Similarly, the & subunits generally contain two 
N- linked glycosylation sites (at positions that vary somewhat 
20 with the nature of the 0 chain) and similar alterations can be 
made at these sites. The CTP extension of hOG contains four 
0- linked glycosylation sites, and conservative mutations at the 
serine residues (e.g., conversion of the serine to alanine) 
destroys these sites. Destruction of the O-linked glycosylation 
25 sites may effect conversion of against activity to antagonist 
activity. 

Finally, alterations in amino acid sequence that are 
proximal to the N- linked or O-linked glycosylation sites 
influence the nature of the glycosylation that is present on the 

30 resulting molecule and also alter activity. 

Alterations in amino acid sequence also include both 
insertions and deletions. Thus, truncated forms of the hormones 
are included among variants, e.g., mutants of the a subunit 
which are lacking same or all of the amino acids at positions 

35 85-92 at the C- terminus. In addition, a subunits with 1-10 
amino acids deleted from the N- terminus are included. Some 
useful variants of the hormone quartet described herein are set 
forth in U.S. Patent 5,177,193 issued 5 January 1993 and 
incorporated herein by reference. As shown therein, the 
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glycosylation patterns can be altered by destroying the relevant 
sites or, in the alternative, by choice of host cell in which 
the protein is produced. 

As explained above, the single chain forms are 
5 convenient starting materials for various engineered muteins. 
Such muteins include those with non- critical regions altered or 
removed. Such deletions and alterations may comprise entire 
loops, so that sequences of considerably more than 10 amino 
acids may be deleted or changed. The single chain molecules 

10 must, however, retain at least the receptor binding domains 
and/ or the regions involved in signal transduction. 

There is considerable literature on variants of the 
hormone quartet described herein and it is clear from this 
literature that a large number of possible variants which result 

15 both in agonist and antagonist activity can be prepared. Such 
variants are disclosed, for example, in Chen, F. et al. Molec 
Endocrinol (1992) £:914-919; Yoo, J. et al. J Biol Chem (1993) 
2££:13034-13042; Yoo, J. et al . J Biol Chem (1991) 266:17741- 
17743; Puett, D. et al. Glycoprotein Hormones. Lusbader, J.W. et 

20 al. EDS, S pringer Verlaa New York (1994) 122-134; Kuetmann, H.T. 
et al. (ibid) pages 103-117; Erickson, L.D. et al. Endocrinology 
(1990) 126:2555-2560; and Bielinska, M. et al. J Cell Biol 
(1990) 111:330a (Abstract 1844). 

As described hereinabove, one method of constructing 

25 effective antagonists is to prepare a single- chain molecule 

containing two fi subunits of the same or different member of the 
glycoprotein quartet. Particularly preferred variants of these 
single- chain forms include those wherein one or more cystine- 
link is deleted, typically by substituting a neutral amino acid 

30 for one or both cysteines which participate in the link. 

Particularly preferred cystine links which may be deleted are 
those between positions 26 and 110 and between positions 23 and 
72. 

In addition, it has been demonstrated that the ft 
35 subunits of the hormone quartet can be constructed in chimeric 
forms so as to provide biological functions of both components 
of the chimera, or, in general, hormones of altered biological 
function. Thus, chimeric molecules which exhibit both FSH and 
LH/CG activities can be constructed as described by Moyle, Proc 
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Pfltl Acad SCi (1991) M:760-764; Moyle, HatllES (1994) 368:251- 
255. As disclosed in these papers, substitute amino acids 
101-109 of FSH-/J for the corresponding residues in the CG-0 
subunit yields an analog with both hCG and FSH activity. 
5 TheBe chimeric forms of 0 subunits can also be used in 

the single -chain compounds which couple two 0 subunits into a 
single molecule. 

Although it is recognized that glycosylation pattern 
has a profound influence on activity both qualitatively and 
10 quantitatively, for convenience the terms FSH, LH, TSH, and CG 0 
subunits refers to the amino acid sequence characteristic of the 
peptides, as does "or subunit." When only the 0 chain is 
referred to, the terms will be, for example, FSH0; when the 
heterodimer is referred to, the simple term "FSH" will be used. 
15 It will be clear from the context in what manner the 

glycosylation pattern is affected by, for example, recombinant 
expression host or alteration in the glycosylation sites. Forms 
of the glycoprotein with specified glycosylation patterns will 
be so noted. 

20 As used herein "peptide" and "protein" are used 

interchangeably, since the length distinction between them is 
arbitrary. 

In the single- chain forms of the present invention, 
the a and/or 0 chain may contain a CTP extension inserted into a 

25 noncritical region. 

"Noncritical " regions of the a and 0 subunits are 
those regions of the molecules not required for biological 
activity (including agonist and antagonist activity) . In 
general, these regions are removed from binding sites, precursor 

30 cleavage sites, and catalytic regions. Regions critical for 
inducing proper folding, binding to receptors, catalytic 
activity and the like should be avoided; similarly, regions 
which are critical to assure the three-dimensional conformation 
of the protein should be avoided. It should be noted that some 

35 of the regions which are critical in the case of the dimer 
become non- critical in the single chain forms since the 
conformational restriction imposed by the single chain may 
obviate the necessity for these regions. The ascertainment of 
noncritical regions is readily accomplished by deleting or 
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modifying candidate regions and conducting an appropriate assay 
for the desired activity • Regions where modifications result in 
loss of activity are critical; regions wherein the alteration 
results in the same or similar activity (including antagonist 
5 activity) are considered noncritical. 

It should be emphasized, that by "biological activity" 
is meant activity which is either agonistic or antagonistic to 
that of the native hormones. Thus, certain regions are critical 
for behavior of a variant as an antagonist, even though the 

10 antagonist is unable to directly provide the physiological 
effect of the hormone. 

For example, for the a subunit, positions 33-59 are 
thought to be necessary for signal transduction and the 20 amino 
acid stretch at the carboxy terminus is needed for signal 

15 transduction/ receptor binding. Residues critical for assembly 
with the 0 subunit include at least residues 33-58, particularly 
37-40. 

Where the noncritical region is "proximal" to the N- 
or C- terminus, the insertion is at any location within 10 amino 

20 acids of the terminus, preferably within 5 amino acids, and most 
preferably at the terminus per se. 

In general, "proximal" is used to indicate a position 
which is within 10 amino acids, preferably within five amino 
acids, of a referent position, and most preferably at the 

25 referent position per se. Thus, certain variants may contain 
substitutions of amino acids "proximal 11 to a glycosylation site; 
the definition is relevant here. In addition, the a and 0 
subunits may be linked to each other at positions "proximal" to 
their N- or C- termini. 

30 As used herein, the "CTP unit" refers to an amino acid 

sequence found at the carboxy terminus of human chorionic 
gonadotropin 0 subunit which extends from amino acid 112-118 to 
residue 145 at the C- terminus or to a portion thereof. Thus, 
each "complete" CTP unit contains 28-34 amino acids, depending 

35 on the N- terminus of the CTP. The native sequence of positions 
112-145 is shown in Figure 2. 

By a "partial" CTP unit is meant an amino acid 
sequence which occurs between positions 112-118 to 145 
inclusive, but which has at least one amino acid deleted from 
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the shortest possible "complete" CTP unit (i.e. from positions 
118-145). The "partial" CTP units included in the invention 
preferably contain at least one O-glycosylation site if agonist 
activity is desired. Some nonglycosylated forms of the hormones 
5 are antagonists and are useful as such. The CTP unit contains 
four such sites at the serine residues at positions 121 (site 
1); 127 (site 2); 132 (site 3); and 138 (site 4) . The partial 
forms of CTP useful in agonists of the invention will contain 
one or more of these sites arranged in the order in which they 
10 appear in the native CTP sequence. Thus, the "partial" CTP unit 
employed in agonists of the invention may include all four 
glycosylation sites; sites l f 2 and 3; sites 1, 2 and 4; sites 
1, 3 and 4; sites 2, 3 and 4; or simply sites 1 and 2; 1 and 3; 
1 and 4; 2 and 3; 2 and 4; or 3 and 4; or may contain only one 
15 of sites 1, 2, 3 or 4. 

By "tandem" inserts or extensions is meant that the 
insert or extension contains at least two "CTP units". Each CTP 
unit may be complete or a fragment, and native or a variant. 
All of the CTP units in the tandem extension or insert may be 
20 identical, or they may be different from each other. Thus, for 
example, the tandem extension or insert may generically be 
partial -complete; partial -partial; partial -complete-partial; 
complete- complete-partial, and the like wherein each of the 
noted partial or complete CTP units may independently be either 
25 a variant or the native sequence. 

The "linker moiety" is a moiety that joins the a and 0 
sequences without interfering with the activity that would 
otherwise be exhibited by the same or and 0 chains as members of 
a heterodimer, or which alters that activity to convert it from 
30 agonist to antagonist activity. The level of activity may 

change within a reasonable range, but the presence of the linker 
cannot be such so as to deprive the single- chain form of both 
substantial agonist and substantial antagonist activity. The 
single- chain form must remain as a single- chain form when it is 
35 recovered from its production medium and must exhibit activity 
pertinent to the hormonal activity of the heterodimer, the 
elements of which form its components. 
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Variants 

The hormone subunits and the CTP units may correspond 
exactly to the native hormone or CTP sequence, or may be 
variants. The nature of the variants has been defined 
hereinabove. In such variants, 1-10, preferably 1-8, and most 
preferably 1-5 of the amino acids contained in the native 
sequence are substituted by a different amino acid compared to 
the native amino acid at that position, or 1-10, more preferably 
1-8 and most preferably 1-5 amino acids are simply deleted or 
combination of these. As pointed out above, when non- critical 
regions of the single chain forms are identified, in particular, 
through detecting the presence of non-critical "loops", the 
number of amino acids altered by deletion or substitution may be 
increased to 20 or 30 or any arbitrary number depending on the 
length of amino acid sequence in the relevant non- critical 
region. Of course, deletion or substitutions in more than one 
non- critical region results in still greater numbers of amino 
acids in the single chain forms being affected and substitution 
and deletions strategies may be used in combination. The 
substitutions or deletions taken cumulatively do not result in 
substantial elimination of agonist or antagonist activity 
associated with the hormone. Substitutions by conservative 
analogs of the native amino acid are preferred. 

"Conservative analog" means, in the conventional 
sense, an analog wherein the residue substituted is of the same 
general amino acid category as that for which substitution is 
made. Amino acids have been classified into such groups, as is 
understood in the art, by, for example, Dayhoff, M. et al M 
Atlas of Protein Sequences and Structure (1972) £:89-99. In 
general, acidic amino acids fall into one group; basic amino 
acids into another; neutral hydrophilic amino acids into 
another; and so forth. 

More specifically, amino acid residues can be 
generally subclassif ied into four major subclasses as follows: 

Acidic: The residue has a negative charge due to loss 
of H ion at physiological pH and the residue is attracted by 
aqueous solution so as to seek the surface positions in the 
conformation of a peptide in which it is contained when the 
peptide is in aqueous medium at physiological pH. 



WO 96/05224 PCT/US95V09664 

- 15 - 

Basic: The residue has a positive charge due to 
association with H ion at physiological pH and the residue is 
attracted by aqueous solution so as to seek the surface 
positions in the conformation of a peptide in which it is 
contained when the peptide is in aqueous medium at physiological 
pH. 

Neutral /nonpolar: The residues are not charged at 
physiological pH and the residue is repelled by aqueous solution 
so as to seek the inner positions in the conformation of a 
peptide in which it is contained when the peptide is in aqueous 
medium. These residues are also designated "hydrophobic" 
herein. 

Neutral /polar: The residues are not charged at 
physiological pH, but the residue is attracted by aqueous 
solution so as to seek the outer positions in the conformation 
of a peptide in which it is contained when the peptide is in 
aqueous medium. 

It is understood, of course, that in a statistical 
collection of individual residue molecules some molecules will 
be charged, and some not, and there will be an attraction for or 
repulsion from an aqueous medium to a greater or lesser extent. 
To fit the definition of "charged, " a significant percentage (at 
least approximately 25%) of the individual molecules are charged 
at physiological pH. The degree of attraction or repulsion 
required for classification as polar or nonpolar is arbitrary 
and, therefore, amino acids specifically contemplated by the 
invention have been classified as one or the other. Most amino 
acids not specifically named can be classified on the basis of 
known behavior. 

Amino acid residues can be further subclassif led as 
cyclic or noncyclic, and aromatic or nonaromatic, self- 
explanatory classifications with respect to the side chain 
substituent groups of the residues, and as small or large. The 
residue is considered small if it contains a total of 4 carbon 
atoms or less, inclusive of the carboxyl carbon. Small residues 
are, of course, always nonaromatic. 

For the naturally occurring protein amino acids, 
subclassif ication according to the foregoing scheme is as 
follows. 
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Acidic ; Aspartic acid and Glutamic acid; 
Basic/noneyclic: Arginine, Lysine; 
Pag*C/CYCUC : Hist idine ; 

Neutral /polar/small : Glycine, serine, 
5 cysteine; 

Neutral /nonpolar /small : Alanine; 

Neutral /polar/larae/nonaromatic : Threonine, 
Asparagine , Glutamine ; 

Neutral /polar/larae aromatic: Tyrosine; 

10 Neutral/nonpolar/laroe/no naromatic : Valine, 

Isoleucine, Leucine, Methionine; 

Neutral /nonpolar /large /a romatic : Phenylalanine, and 
Tryptophan. 

The gene -encoded secondary amino acid proline, 

15 although technically within the group neutral /nonpolar/ 

large/cyclic and nonaromatic, is a special case due to its known 
effects on the secondary conformation of peptide chains, and is 
not, therefore, included in this defined group. 

If the single- chain proteins of the invention are 

20 constructed by recombinant methods, they will contain only gene 
encoded amino acid substitutions; however, if any portion is 
synthesized by standard, for example, solid phase, peptide 
synthesis methods and ligated, for example, enzymatically, into 
the remaining protein, non-gene encoded amino acids, such as 

25 aminoisobutyric acid (Aib) , phenylglycina (Phg) , and the like 
can also be substituted for their analogous counterparts. 
These non- encoded amino acids also include, for 
example, 0- alanine (0-Ala) , or other omega -amino acids, such as 
3 -amino propionic, 4 -amino butyric and so forth, sarcosine 

30 (Sar), ornithine (Orn) , citrulline (Cit) , t-butylalanine 
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(t-BuA) , t-butylglycine (t-BuG) , N-methylisoleucine (N-Melle) , 
and cyclohexylalanine (Cha), norleucine (Nle) , cysteic acid 
(Cya) 2-naphthylalanine (2-Nal); 1,2, 3, 4- tetrahydroisoquinoline- 
3-carboxylic acid (Tic); mercapt oval eric acid (Mvl); 0-2- 
5 thienylalanine (Thi) ; and methionine sulfoxide (MSO) . These 
also fall conveniently into particular categories. 
Based on the above definitions, 

Sar and 0-Ala and Aib are neutral /nonpolar/ small; 
t-BuA, t-BuG, N-Melle, Nle, Mvl and Cha are 
10 neutral /nonpolar /large/nonaromatic; 

Orn is basic/noncyclic; 
Cya is acidic; 

Cit, Acetyl Lys, and MSO are neutral /polar/ 
large/nonaromatic; and 

15 Phg, Nal, Thi and Tic are neutral/nonpolar/large/ 

aromatic. 

The various omega-amino acids are classified according 
to size as neutral /nonpolar/ small (0-Ala, i.e., 3- 
aminopropionic, 4-aminobutyric) or large (all others) . 
20 Thus, amino acid substitutions other than those 

encoded in the gene can also be included in peptide compounds 
within the scope of the invention and can be classified within 
this general scheme according to their structure. 

Preferred Embodiments of the Sin ? l e -Chain Hnrmnnpa 
!5 Th e Bingle- chain hormones of the invention are most 

efficiently and economically produced using recombinant 
techniques. Therefore, those forms of a and j? chains, CTP units 
and other linker moieties which include only gene- encoded amino 
acids are preferred. It is possible, however, as set forth 
0 above, to construct at least portions of the single- chain 
hormones using synthetic peptide techniques or other organic 
synthesis techniques and therefore variants which contain 
nongene- encoded amino acids are also within the scope of the 
invention. 

5 In the most preferred embodiments of the single- chain 

hormones of the invention, the C- terminus of the 0 subunit is 
covalently linked, optionally through a linker, to the 
N- terminus of the mature a subunit; forms wherein the C- terminus 
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of the a eubunit is linked to the N- terminus of the 0 subunit 
are also useful, but may have less activity either as 
antagonists or agonists of the relevant receptor* The linkage 
can be a direct peptide linkage wherein the C- terminal amino 
acid of one subunit is directly linked through the peptide bond 
to the N- terminus of the other; however, in many instances it is 
preferable to include a linker moiety between the two termini. 
In many instances, the linker moiety will provide at least one 0 
turn between the two chains. The presence of proline residues 
in the linker may therefore be advantageous. 

As described above, the N- terminus of the a chain may 
also be coupled to the N- terminus of the 0 chain or the 
C- terminus of the a to the C- terminus of the 0 chain in any case 
through a linker unit, similar combinations are included in the 
single chain forms comprising two a or two 0 subunits. 

It should be understood that in discussing linkages 
between the termini of the subunits comprising the single chain 
forms, one or more termini may be altered by substitution and/or 
deletion as described above. 

Preferred embodiments of the single- chain compounds 
containing two 0 subunits are those wherein the C- terminus of 
one unit is linked to the N- terminus of the other, optionally 
through a linker, preferably a peptide linker. Also possible, 
as described above, are linkages between the N- termini of the 
two 0 chains or linkages between the C- termini of the two 0 
chains; in these cases, of course, a linker is required. 

While the head- to-head, tail -to- tail and head- to- tail 
configurations of both the single- chain heterodimer and the 
single- chain two-/3 subunit form have been described, the linkage 
between the two subunits may also occur at positions not 
precisely at the N- or C- terminus of each member but at 
positions proximal thereto. 

In one particularly preferred set of embodiments, the 
linkage is head- to- tail and the linker moiety will include one 
or more CTP units and/or variants or truncated forms thereof. 
Preferred forms of the CTP units used in such linker moieties 
are described hereinbelow. 

Further, the linker moiety may include a drug 
covalently, preferably releasably, bound to the linker moiety. 
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Means for coupling the drug to the linker moiety and for 
providing for its release are conventional. 

In addition to their occurrence in the linker moiety, 
CTP and its variants and truncations may also be included in any 
noncritical region of the subunits making up the single- chain 
hormone. The nature of these inclusions, and their positions, 
is set forth in detail in the parent application herein* 

While CTP units are preferred inclusions in the linker 
moiety, it is understood that the linker may be any suitable 
covalently bound material which provides the appropriate spatial 
relationship between the a and 0 subunits. Thus, for head- to- 
tail configurations the linker may generally be a peptide 
comprising an arbitrary number, but typically less than 100, 
more preferably less than 50 amino acids which has the proper 
hydrophilicity/hydrophobicity ratio to provide the appropriate 
spacing and confirmation in solution. In general, the linker 
should be on balance hydrophilic so as to reside in the 
surrounding solution and out of the way of the interaction 
between the a and 0 subunits or the two 0 subunits. It is 
preferable that the linker include 0 turns typically provided by 
proline residues. Any suitable polymer , including peptide 
linkers, with the above -described correct characteristics may be 
used. 

One particular linker moiety that is not included 
within the scope of the invention is that which includes a 
signal peptide immediately upstream of the downstream subunit. 

Particularly preferred embodiments of the single- chain 
hormones of the invention include: 

0FSH-a; 0FSH-0FSH; 0FSH-0LH; 

0LH-a; 0LH-0LH; 0LH-0FSH; 

0TSH-a; 0TSH-0TSH; 0TSH-0FSH; 

0CG-<*; 0-CG-0-CG; 0OG-0FSH; 0CG-0TSH; 

0FSH-CTP-a; 0FSH-CTP-0FSH; 0FSH-CTP-0LH; 

0LH-CTP-cr; 0LH-CTP-0LH; BLH-CTP-0FSH; 

0CG-CTP-a? 0FSH - CTP - CTP - 0LH ; 

0FSH-CTP-CTP-a; 0FSH-CTP-CTP-0TSH; 

0LH-CTP-CTP-a; 0LH-CTP-CTP-0LH; 

0CG-CTP-CTP-a; 0CG-CTP-CTP-0LH; 

of-a; a-CTP-a; and a-CTP-CTP-a 
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and the like. Also particularly preferred are the human forms 
of the subunits. In the above constructions, "CTP B refers to 
CTP or its variants or truncations as further explained in the 
paragraph below. 

5 Preferred Embodiments of CTP Units 

The notation used for the CTP units of the invention 
is as follows: for portions of the complete CTP unit, the 
positions included in the portion are designated by their number 
as they appear in Figure 2 herein. Where substitutions occur, 
10 the substituted amino acid is provided along with a superscript 
indicating its position. Thus, for example, CTP (120-143) 
represents that portion of CTP extending from positions 120 to 
143; CTP (120-130; 136-143) represents a fused amino acid 
sequence lacking positions 118-119, 131-135, and 144-145 of the 
15 native sequence. CTP (Arg 122 ) refers to a variant wherein the 
lysine at position 122 is substituted by an arginine; CTP 
(lie 134 ) refers to a variant wherein the leucine at position 134 
is substituted by isoleucine. CTP (Val 128 Val x * 3 ) represents a 
variant wherein two substitutions have been made, one for the 
20 leucine at position 128 and the other for the isoleucine at 
position 142. CTP (120-143; lie 128 Ala 130 ) represents the 
relevant portion of the CTP unit where the two indicated 
substitutions have been made. 

Also preferred among variants of CTP are those wherein 
25 one or more of the O- linked glycosylation sites have been 
altered or deleted. One particularly preferred means of 
altering the site to prevent glycosylation is substitution of an 
alanine residue for the serine residue in these sites. 

Particularly preferred are those CTP units of the 
30 following formulas: 

#1 CTP (116-132) 
#2 CTP (118-128; 130-135) 
#3 CTP (117-142) 
#4 CTP (116-130) 
35 #5 CTP (116-123; 137-145) 

#6 CTP (115-133; 141-145) 
#7 CTP (117-140, Ser 123 Gin 140 ) 
#8 CTP (125-143, Ala 130 ) 
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10 



#9 


CTP 


(135- 


■145, 


#10 


CTP 


(131- 


■143, 


#11 


CTP 


(118- 


■132) 


#12 


CTP 


(118- 


127) 


#13 


CTP 


(118- 


145) 


#14 


CTP 


(115- 


132) 


#15 


CTP 


(115- 


127) 


#16 


CTP 


(115- 


145) 


#17 


CTP 


(112- 


145) 


#18 


CTP 


(112- 


132) 


#19 


CTP 


(112- 


127) 



Preferred Embodiments of the *v a nd 6 Subunits 

Of course, the native forms of the a and 0 subunits in 
the single-chain form are among the preferred embodiments. 
15 However, certain variants are also preferred. 

In particular, variants of the a subunit in which the 
N- linked glycosylation site at position 52 is eliminated or 
altered by amino acid substitutions at or proximal to this site 
are preferred for antagonist activity. Similar modifications at 
20 the glycosylation site at position 78 are also preferred. 
Deletion of one or more amino acids at positions 85-92 also 
affects the nature of the activity of hormones containing the a 
subunit and substitution or deletion of amino acids at these 
positions is also among the preferred embodiments. 
25 Similarly, the N- linked glycosylation sites in the 0 

chain can conveniently be modified to eliminate glycosylation 
and thus affect the agonist or antagonist activity of the 0 
chains. If CTP is present, either natively as in CG or by 
virtue of being present as a linker, the 0- linked glycosylation 
30 sites in this moiety may also be altered. 

Particular variants containing modified or deleted 
glycosylation sites are set forth in Yoo, J. et al. J Biol Chem 
(1993) 2£fl:13034-13042; Yoo, J. et al. J Biol Chrm (1991) 
2££: 17741-17743; and Bielinska, M. et al. J Cell Biol (1990) 
35 111:330a (all cited above) and in Matzuk, M.M. et al. J Biol 

£bSB (1989) 2£A:2409-2414; Keene, J.L. et al. J Biol Chem (1989) 
2£A:4769-4775; and Keene, J.L. et al. Mol Endocrinol (1989) 
1:2011-2017. 
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Not only may the glycosylation sites per se be 
modified directly, but positions proximal to these sites are 
preferentially modified so that the glycosylation status of the 
mutant will be affected. For the a subunit, for example, 
5 variants in which amino acids between positions 50-60 are 
substituted, including both conservative and nonconservative 
substitutions, are favored, especially substitutions at 
positions 51, 53 and 55 because of their proximity to the 
glycosylation site at Asn 52 . 
10 Also preferred are mutants of the a subunit wherein 

lysine at position 91 is converted to methionine or glutamic 
acid. 

Although the variants have been discussed in terms of 
variations in the individual subunits hereinabove, it will be 

15 recalled that the single chain forms of the dimer offer 
additional opportunities for modification. Specifically, 
regions that are critical to folding of the dimer may not be 
critical to the correct conformation of the single chain 
molecule and these regions are available for variation in the 

2 0 single chain form, although not described above in terms of 
individual members of the dimeric forms. Further, the single 
chain forms may be modified dramatically in the context of non- 
critical regions whose alteration and/or deletion do not affect 
the biological activity as described above. 

25 Suitable Drugs 

Suitable drugs that may be included in the linker 
moiety include peptides or proteins such as insulin- like growth 
factors; epidermal growth factors; acidic and basic fibroblast 
growth factors; platelet -derived growth factors; the various 

30 colony stimulating factors, such as granulocyte CSF, macrophage- 
CSF, and the like; as well as the various cytokines such as IL- 
2, IL-3 and the plethora of additional interleukin proteins; the 
various interferons; tumpr necrosis factor; and the like. 
Peptide- or protein-based drugs have the advantage that they can 

35 be included in the single- chain and the entire construct can 
readily be produced by recombinant expression of a single gene. 
Also, small molecule drugs such as antibiotics, 
antiinflammatories, toxins, and the like can be used. 
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In general, the drugs included within the linker 
moiety will be those desired to act in the proximity of the 
receptors to which the hormones ordinarily bind. Suitable 
provision for release of the drug from inclusion within the 
5 linker will be provided, for example, by also including sites 
for enzyme -catalyzed lysis as further described under the 
section headed Preparation Methods hereinbelow. 

Other Modifications 

The single -chain proteins of the invention may be 

10 further conjugated or derivatized in ways generally understood 
to derivatize amino acid sequences, such as phosphorylation, 
glycosylation, deglycosylation of ordinarily glycosylated forms, 
modification of the amino acid side chains (e.g., conversion of 
proline to hydroxyproline) and similar modifications analogous 

15 to those post-translational events which have been found to 
occur generally. 

The glycosylation status of the hormones of the 
invention is particularly important. The hormones may be 
prepared in nonglycosylated form either by producing them in 

20 procaryotic hosts or by mutating the glycosylation sites 

normally present in the subunits and/or any CTP units that may 
be present. Both nonglycosylated versions and partially 
glycosylated versions of the hormones can be prepared by 
manipulating the glycosylation sites. Normally, glycosylated 

25 versions are, of course, also included within the scope of the 
invention. 

As is generally known in the art, the single- chain 
proteins of the invention may also be coupled to labels, 
carriers, solid supports, and the like, depending on the desired 

30 application. The labeled forms may be used to track their 
metabolic fate; suitable labels for this purpose include, 
especially, radioisotope labels such as iodine 131, technetium 
99, indium 111, and the like. The labels may also be used to 
mediate detection of the single-chain proteins in assay systems ; 

35 in this instance, radioisotopes may also be used as well as 
enzyme labels, fluorescent labels, chroznogenic labels, and the 
like. The use of such labels is particularly helpful for these 
proteins since they are targeting agents receptor ligand. 
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The proteins of the invention may also be coupled to 
carriers to enhance their immunogenicity in the preparation of 
antibodies specifically immunoreactive with these new modified 
forms. Suitable carriers for this purpose include keyhole 
5 limpet hemocyanin (KLH) , bovine serum albumin (BSA) and 

diphtheria toxoid, and the like. Standard coupling techniques 
for linking the modified peptides of the invention to carriers, 
including the use of bifunctional linkers, can be employed. 

Similar linking techniques, along with others, may be 
10 employed to couple the proteins of the invention to solid 
supports. When coupled, these proteins can then be used as 
affinity reagents for the separation of desired components with 
which specific reaction is exhibited. 

Preparation Methods 

15 Methods to construct the proteins of the invention are 

well known in the art. As set forth above, if only gene encoded 
amino acids are included, and the single-chain is in a head- to- 
tail configuration, the most practical approach at present is to 
synthesize these materials recombinantly by expression of the 

20 DNA encoding the desired protein. DNA containing the nucleotide 
sequence encoding the single- chain forms, including variants, 
can be prepared from native sequences. Techniques for site- 
directed mutagenesis, ligation of additional sequences, PCR, and 
construction of suitable expression systems are all, by now, 

25 well known in the art. Portions or all of the DNA encoding the 
desired protein can be constructed synthetically using standard 
solid phase techniques, preferably to include restriction sites 
for ease of ligation. Suitable control elements for 
transcription and translation of the included coding sequence 

30 can be provided to the DNA coding sequences. As is well known, 
expression systems are. now available compatible with a wide 
variety of hosts, including procaryotic hosts such as bacteria 
. and eucaryotic hosts such as yeast, plant cells, insect cells, 
mammalian cells, avian cells, and the like. 

35 The choice of host is particularly to 

posttranslational events, most particularly including 
glycosylation. The location of glycosylation is mostly 
controlled by the nature of the glycosylation sites within the 
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molecule; however, the nature of the sugars occupying this site 
is largely controlled by the nature of the host. Accordingly, a 
fine-tuning of the properties of the hormones of the invention 
can be achieved by proper choice of host. 

A particularly preferred form of gene for the a 
subunit portion, whether the a subunit is modified or 
unmodified, is the "mini gene" construction. 

As used herein, the a subunit "mini gene" refers to the 
gene construction disclosed in Matzuk, M.M., et al, Mol 
Endocrinol (1988) 2:95-100, in the description of the 
construction of pM 2 /CG a or pM 2 /a. This "minigene" is 
characterized by retention only of the intron sequence between 
exon 3 and exon 4, all upstream introns having been deleted. In 
the particular construction described, the N- terminal coding 
sequences which are derived from exon 2 and a portion of exon 3 
are supplied from cDNA and are ligated directly through an Xbal 
restriction site into the coding sequence of exon 3 so that the 
introns between exons I and II and between exons II and III are 
absent. However, the intron between exons III and IV as well as 
the signals 3' of the coding sequence are retained. The 
resulting minigene can conveniently be inserted as a BamHI/Bglll 
segment. Other means for construction of a comparable minigene 
are, of course, possible and the definition is not restricted to 
the particular construction wherein the coding sequences are 
ligated through an Xbal site. However, this is a convenient 
means for the construction of the gene, and there is no 
particular advantage to other approaches, such as synthetic or 
partially synthetic preparation of the gene. The definition 
includes those coding sequences for the a subunit which retain 
the intron between exons III and IV, or any other intron and 
preferably no other introns. 

For recombinant production, modified host cells using 
expression systems are used and cultured to produce the desired 
protein. These terms are used herein as follows: 

A "modified" recombinant host cell, i.e., a cell 
"modified to contain" with the recombinant expression systems of 
the invention, refers to a host cell which has been altered to 
contain this expression system by any convenient manner of 
introducing it, including transf ection, viral infection, and so 
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forth. "Modified" refers to cells containing this expression 
system whether the system is integrated into the chromosome or 
is extrachromosomal . The "modified" cells may either be stable 
with respect to inclusion of the expression system or not- In 
short, "modified n recombinant host cells with the expression 
system of the invention refers to cells which include this 
expression system as a result of their manipulation to include 
it, when they natively do not, regardless of the manner of 
effecting this incorporation* 

"Expression system" refers to a DNA molecule which 
includes a coding nucleotide sequence to be expressed and those 
accompanying control sequences necessary to effect the 
expression of the coding sequence. Typically, these controls 
include a promoter, termination regulating sequences, and, in 
some cases, an operator or other mechanism to regulate 
expression. The control sequences are those which are designed 
to be functional in a particular target recombinant host cell 
and therefore the host cell must be chosen so as to be 
compatible with the control sequences in the constructed 
expression system. 

If secretion of the protein produced is desired, 
additional nucleotide sequences encoding a signal peptide are 
also included so as to produce the signal peptide operably 
linked to the desired single -chain hormone to produce the 
preprotein. Upon secretion, the signal peptide is cleaved to 
release the mature single -chain hormone. 

As used herein "cells," "cell cultures," and "cell 
lines" are used interchangeably without particular attention to 
nuances of meaning. Where the distinction between them is 
important, it will be clear from the context. Where any can be 
meant, all are intended to be included. 

The protein produced may be recovered from the lysate 
of the cells if produced intracellularly, or from the medium if 
secreted. Techniques for recovering recombinant proteins from 
cell cultures are well understood in the art, and these proteins 
can be purified using known techniques such as chromatography, 
gel electrophoresis, selective precipitation, and the like. 

All or a portion of the hormones of the invention may 
be synthesized directly using peptide synthesis techniques known 
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in the art. Synthesized portions may be ligated, and release 
sites for any drug contained in the linker moiety introduced by 
standard chemical means. For those embodiments which contain 
amino acids which are not encoded by the gene and those 
5 embodiments wherein the head- to-head or tail -to- tail 

configuration is employed, of course, the synthesis must be at 
least partly at the protein level. Head- to-head junctions at 
the natural N- termini or at positions proximal to the natural 
N- termini may be effected through linkers which contain 
10 functional groups reactive with amino groups, such as 

dicarboxylic acid derivatives. Tail- to- tail configurations at 
the C- termini or positions proximal to the C- termini may be 
effected through linkers which are diamines, diols, or 
combinations thereof. 

15 Antifrpjjgg 

The proteins of the invention may be used to generate 
antibodies specifically immunoreactive with these new compounds. 
These antibodies are useful in a variety of diagnostic and 
therapeutic applications. For example, when the single- chain 

20 forms of the invention are used therapeutically in either human 
or veterinary contexts, the levels of drug may be monitored 
using these antibodies using conventional immunoassay 
techniques. In addition, since some of the antibodies raised by 
these single- chain forms are cross -reactive with the 

25 heterodimer, they can be used to diagnose naturally occurring 
levels of the heterodimer. 

The antibodies are generally prepared using standard 
immunization protocols in mammals such as rabbits, mice, sheep 
or rats, and the antibodies are titered as polyclonal antisera 

30 to assure adequate immunization. The polyclonal antisera can 
then be harvested as such for use in, for example, immunoassays. 
Antibody -secreting cells from the host, such as spleen cells, or 
peripheral blood leukocytes, may be immortalized using known 
techniques and screened for production of monoclonal antibodies 

35 immunospecific with the proteins of the invention. 

By "immunospecific for the proteins" is meant 
antibodies which are immunoreactive with the single- chain 
proteins, but not with the heterodimers per se within the 
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general parameters considered to determine affinity or 
nonaffinity. It is understood that specificity is a relative 
term, and an arbitrary limit could be chosen, such as a 
difference in immunoreactivity of 100 -fold or greater. Thus, an 
5 immunospecif ic antibody included within the invention is at 

least 100 times more reactive with the single- chain protein than 
with the corresponding heterodimers . 



Formulation 

The proteins of the invention are formulated and 

10 administered using methods comparable to those known for the 
heterodimers corresponding to the single- chain form. Thus, 
formulation and administration methods will vary according to 
the particular hormone used. However, the dosage level and * 
frequency of administration may be altered as compared to the 

15 heterodimer, especially if CTP units are present in view of the 
extended biological half life due to its presence. 

Formulations for proteins of the invention are those 
typical of protein or peptide drugs such as found in Remington's 
Pharmaceutical Sciences, latest edition, Mack Publishing 

20 Company, Easton, PA. Generally, proteins are administered by 
injection, typically intravenous, intramuscular, subcutaneous, 
or intraperitoneal injection, or using formulations for 
transmucosal or transdermal delivery. These formulations 
generally include a detergent or penetrant such as bile salts, 

25 fusidic acids, and the like. These formulations can be 

administered as aerosols or suppositories or, in the case of 
transdermal administration, in the form of skin patches. 

Oral administration is also possible provided the 
formulation protects the peptides of the invention from 

30 degradation in the digestive system. 

Optimization of dosage regimen and formulation is 
conducted as a routine matter and as generally performed in the 
art. 

Methods of Use 

35 The single- chain peptides of the invention may be used 

in many ways, most evidently as substitutes for the 
heterodimeric forms of the hormones. Thus, like the 
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heterodimers , the agonist forms of the single- chain hormones of 
the invention can be used in treatment of infertility, as aids 
in in vitro fertilization techniques, and other therapeutic 
methods associated with the native hormones, both in humans and 
5 in animals. 

The single- chain hormones are also useful as reagents 
in a manner similar to the heterodimers. 

In addition, the single -chain hormones of the 
invention may be used as diagnostic tools to detect the presence 
10 or absence of antibodies with respect to the native proteins in 
biological samples. They are also useful as control reagents in 
assay kits for assessing the levels of these hormones in various 
samples. Protocols for assessing levels of the hormones 
themselves or of antibodies raised against them are standard 
15 immunoassay protocols commonly known in the art. Various 

competitive and direct assay methods can be used involving a 
variety of labeling techniques including radio- isotope labeling, 
fluorescence labeling, enzyme labeling and the like. 

The single-chain hormones of the invention are also 
useful in detecting and purifying receptors to which the native 
hormones bind. Thus, the single- chain hormones of the invention 
may be coupled to solid supports and used in affinity 
chromatographic preparation of receptors or antihormone 
antibodies. The resulting receptors are themselves useful in 
assessing hormone activity for candidate drugs in screening 
tests for therapeutic and reagent candidates. 

Finally, the antibodies uniquely reactive with the 
single- chain hormones of the invention can be used as 
purification tools for isolation of subsequent preparations of 
these materials. They can also be used to monitor levels of the 
single -chain hormones administered as drugs. 

The following examples are intended to illustrate but 
not to limit the invention. 
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Preparation of DNA Encoding CGfl-rv 

Figure 1 shows the construction of an insert for an 
expression vector wherein the C- terminus of the 0- chain of human 
CG is linked to the N- terminus of the mature human a subunit. 

As shown in Figure 1, the polymerase chain reaction 
(PCR) is utilized to fuse the two subunits between exon 3 of CG0 
and exon 2 of the a subunit so that the codon for the carboxy 
terminal amino acid of CG/3 is fused directly in reading frame to 
that of the N- terminal amino acid of the a subunit. This is 
accomplished by using a hybrid primer to amplify a fragment 
containing exon 3 of CG0 wherein the hybrid primer contains a 
n tail n encoding the N- terminal sequence of the a subunit. The 
resulting amplified fragment thus contains a portion of exon 2 
encoding human CGa. 

Independently, a hybrid primer encoding the N- terminal 
sequence of the a subunit fused to the codons corresponding to 
the C- terminus of CG0 is used as one of the primers to amplify 
the a minigene. The two amplified fragments, each now 
containing overlapping portions encoding the other subunit are 
together amplified with two additional primers covering the 
entire span to obtain the Sail insert. 

In more detail, reaction 1 shows the production of a 
fragment containing exon 3 of CG0 and the first four amino acids 
of the mature a subunit as well as a Sail site 5' -ward of the 
coding sequences. It is obtained by amplifying a portion of the 
CG/3 genomic sequence which is described by Matzuk, M.M. et al. 
Proc Natl Acad Sci USA (1987) M: 6354-6358 ; Policastro, P. et 
al. J Biol Chem (1983) 2£fl: 11492-11499 . 

Primer 1 provides the Sail site and has the sequence: 

5' -GGA GGA AGG GTG GTC GAC CTC TCT GGT-3 ' . 

Sail 

The other primer, primer 2, is complementary to four 
codons of the a N- terminal sequence and five codons of the CG0 
C- terminal sequence and has the sequence: 
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5'-CAC ATC AGG AGClTTG TGG GAG GAT CGG-3'. 
«— O(|0— * 

The resultant amplified segment which is the product 
of reaction I thus has a Sail site 5 '-ward of the fused coding 
5 region. 

In reaction II, an analogous fused coding region is 
obtained from the a minigene described hereinabove. Primer 3 is 
a hybrid primer containing four codons of the 0 subunit and five 
codons of a and has the sequence: 

10 5' - ATC CTC CCA CAAlGCT CCT GAT GTG CAG-3' . 

+—&\a—* 

Primer 4 contains a Sail site and is complementary to 
the extension of of exon 4. Primer 4 has the sequence: 

5 ' -TGA GTC GAC ATG ATA ATT CAG TGA TTG AAT-3 ' . 
15 Sail 

Thus, the products of reactions I and II overlap, and 
when subjected to PCR in the presence of primers 1 and 4 yield 
the desired Sail product as shown in reaction III. 

The amplified fragment containing CG0 exon 3 and the a 
20 minigene is inserted into the Sail site of pM 2 HA-CG0exonl,2 an 
expression vector which is derived from pM 2 containing CG0 exons 
1 and 2 in the manner described by Sachais, B., Snider, R.M., 
Lowe, J., Krause, J ♦ J Biol Chem (1993) 2££:2319. pM 2 
containing CG0 exons 1 and 2 is described in Matzuk, M.M. et al. 
25 Prog ffatl ACflfl USA (1987) £1:6354-6358 and Matzuk, M.M. et al. J 
Cell Biol (1988) lfl£:1049-1059. 

This expression vector then will produce the single - 
chain form human CG wherein the C- terminus of the fi subunit is 
directly linked to the N- terminus of the a subunit. 

Example 2 

Production and Activity of the Slnalg-C hain Human QQ 
The expression vector constructed in Example 1 was 
transfected into Chinese hamster ovary (CHO) cells and 
production of the protein was assessed by irarrtunoprecipitation of 
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radiolabeled protein on SDS gels. The culture medium was 
collected and the bioactivity of the single- chain protein was 
compared to the heterodimer in a competitive binding assay with 
respect to the human LH receptor. In this assay, the cDNA 
encoding the entire human LH receptor was inserted into the 
expression vector pCMX (Oikawa, J. X-C et al. Mol Endocrinol 
(1991) £: 759 -768) . Exponentially growing 293 cells were 
transfected with this vector using the method of Chen, C. et al. 
Mol Cell Biol (1987) 2:2745-2752. 

In the assay, the cells expressing human LH receptor 
(2 x 10 5 /tube) were incubated with 1 ng of labeled hCG in 
competition with the sample to be tested at 22°C for 18 hours. 
The samples were then diluted 5-fold with cold Dulbecco's PBS (2 
ml) supplemented with 0.1* BSA and centrifuged at 800 x g for 15 
minutes. The pellets were washed twice with D's PBS and 
radioactivity was determined with a gamma counter. Specific 
binding was 10-12% of the total labeled (iodinated) hCG added in 
the absence of sample. The decrease in label in the presence of 
sample measures the binding ability in the sample. In this 
assay, with respect to the human LH receptor in 293 cells, the 
wild- type hCG had an ED 50 of 0.47 ng and the single- chain protein 
had an ED 50 of l.l ng. 

In an additional assay for agonist activity, 
stimulation of cAMP production was assessed. In this case, 293 
cells expressing human LH receptors (2 x 10 5 /tube) " were 
incubated with varying concentrations of the heterodimeric hCG 
or single- chain hCG and cultured for 18 hours. The 
extracellular cAMP levels were determined by specific 
radioimmunoassay as described by Davoren, J.B. et al. Biol 
Reprod (1985) 21:37-52. In this assay, the wild-type had an ED 50 
of 0.6 ng/ml and the single-chain form had an ED 50 of 1.7ng/ml. 
(ED 50 is 50% of the effective dose.) 

Thus, in all cases, the behavior of both the wild- type 
and single- chain forms is similar. . 

Example 3 

Additional Activity Assays 

The medium from CHO cells transfected with an 
expression vector for the /JPSH-CTP-a single-chain construct was 
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recovered and assayed as described in Example 2. The results of 
the competition assay for binding to FSH receptor are shown in 
Figure 3. The results indicate that the single-chain form is 
more effective than either wild- type FSH or FSH containing a CTP 
extension at the 0 chain in inhibiting binding of FSH itself to 
the receptor. The ED 50 for the single -chain form is 
approximately 50 mlU/ml while the ED 50 for the extended 
heterodimer is somewhat over 100 mlU/ml. That for wild- type FSH 
is about 120 mlU/ml. 

The results of the signal transduction assay are shown 
in Figure 4. The effectiveness of all three types of FSH is 
comparable . 

Construction Of Additional Expreaaion Vectora 
In a manner similar to that set forth in Example 1, 
expression vectors for the production of single -stranded FSH, 
TSH and LH (0FSH-a, 0FSH-CTP-a, 0TSH-a, 0TSH-CTP-a, 0LH-a, 
/SLH-CTP-a) are prepared and transfected into CHO cells. The 
resulting hormones show activities similar to those of the wild- 
type form, when assayed as set forth in Example 2. 

Similarly, expression vectors for the "two-0" single- 
chain forms are constructed in a manner analogous to that set 
forth in Exajnple 1 and expressed and assayed as described in 
Example 2. 
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1. A glycosylated or nonglycosylated protein which 

comprises: 

the amino acid sequence of the a subunit common to the 
5 glycoprotein hormones linked covalently, optionally through a 
linker moiety, to the amino acid sequence of the 0 subunit of 
one of said hormones, 

wherein said a and 0 subunits consist of the native 
amino acid sequences or variants of said amino acid sequences. 

10 2. The protein of claim 1 wherein said protein 

includes said linker moiety, and said linker moiety optionally 
includes a drug to be targeted to the receptor for the 
glycoprotein hormone . 

3. The protein of claim 1 wherein a position 
proximal to the C- terminus of the ft subunit is linked 
covalently, optionally through a linker moiety, to a position 
proximal to the N- terminus of the a subunit, or wherein a 
position proximal to the C- terminus of the a subunit is linked 
covalently, optionally through a linker moiety, to a position 
proximal to the N- terminus of the 0 subunit* 

4. The protein of claim 1 wherein the 0 subunit is 
the 0 subunit of human chorionic gonadotropin or variant 
thereof ; or 

wherein the 0 subunit is the 0 subunit of PSH or 
25 variant thereof; or 

wherein the 0 subunit is the 0 subunit of FSH extended 
a position proximal to its C-terminus by a complete or partial 
CTP unit or variant thereof; or 

wherein the 0 subunit is the 0 subunit of LH or 
30 variant thereof; or 

wherein the 0 subunit is the 0 subunit of LH extended 
at a position proximal to its C- terminus by a complete or 
partial CTP unit or variant thereof; or 

wherein the 0 subunit is the 0 subunit of TSH or 
35 variant thereof; or 
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wherein the 0 subunit is the j8 subunit of TSH extended 
at a position proximal to its C- terminus by a complete or 
partial CTP unit or variant thereof; and/or 

wherein the a subunit is extended at a position 
5 proximal to its N- terminus by a complete or partial CTP unit or 
variant thereof. 

5. The protein of claim 1 wherein the a subunit or 0 
subunit or both are modified by the insertion of a complete or 
partial CTP unit or variant thereof into a noncritical region 

10 thereof and/or wherein said linker moiety includes a complete or 
partial CTP unit or variant thereof. 

6. The protein of claim 5 wherein said partial CTP 
unit consists of positions 112-132; 115-132; 116-132; or 118- 
132; or 112-127; 115-127; 116-127; or 118-127; and/or 

15 wherein said CTP has one or more 0- linked 

glycosylation sites modified or deleted; and/or 

wherein said noncritical region is proximal to the C- 
terminus; or 

wherein said noncritical region is proximal to the N- 

20 terminus. 

7. The protein of claim 1 wherein the (J subunit 
contains a modification in one or more N-linked glycosylation 
sites; and/or 

wherein one or both of the N-linked glycosylation 
25 sites of the a subunit have been modified; and/or 
which is nonglycosylated; and/or 

wherein one or more amino acids at positions 85-92 of 
the ot subunit have been deleted. 

8. A glycosylated or nonglycosylated protein which 

30 comprises: 

the amino acid sequence of the 0 subunit of a 
glycoprotein hormone linked covalently, optionally through a 
linker moiety, to the amino acid sequence of the 0 subunit of 
the same or different glycoprotein hormone, 
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wherein said £ subunits consist of the native amino 
acid sequences of said hormones or variants of said amino acid 
sequences , or 

the amino acid sequence of the a subunit of a 
glycoprotein hormone linked covalently, optionally through a 
linker moiety, to the amino acid sequence of the of subunit of 
the same or different glycoprotein hormone, 

wherein said a subunits consist of the native amino 
acid sequences of said hormones or variants of said amino acid 
sequences . 

9. The protein of claim 8 wherein a position 
proximal to the C- terminus of one a or 0 subunit is linked 
covalently, optionally through a linker moiety, to a position 
proximal to the N- terminus of the other a or 0 subunit* 

10. The protein of claim 8 wherein said protein 
includes a linker moiety. 

11. A pharmaceutical or veterinary composition which 
comprises the protein of claim 1 or 8 in admixture with a 
suitable pharmaceutical excipient. 

12. Antibodies immunospecif ic for the protein of 
claim 1 or 8. 

13. A DNA or RNA molecule which comprises a 
nucleotide sequence encoding the protein of claim 1 or 8. 

14. An expression system for production of a single- 
chain form of a glycoprotein hormone which expression system 
comprises a first nucleotide sequence encoding the protein of 
claim 1 or 8 operably linked to control sequences capable of 
effecting the expression of said first nucleotide sequence. 

15. The expression system of claim 14 which further 
contains a second nucleotide sequence encoding a signal peptide 
operably linked to the protein encoded by said first nucleotide 
sequence . 
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16. A host cell modified to contain the expression 
system of claim 15. 

17. A method to produce a single -chain form of a 
glycoprotein hormone 

which method comprises culturihg the cells of claim 16 
under conditions wherein said glycoprotein hormone is produced; 
and 

recovering the glycoprotein hormone from the culture. 
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